Whole Genome Comparison using Commodity Workstations

نویسندگان

  • Arpith C. Jacob
  • Sugata Sanyal
چکیده

—Whole genome comparison consists of comparing or aligning two genome sequences in the hope that analogous functional or physical characteristics may be observed. Sequence comparison is done via a number of slow rigorous algorithms, or faster heuristic approaches. However, due to the large size of genomic sequences, the capacity of current software is limited. In this work, we design a parallel-distributed system for the Smith-Waterman dynamic programming sequence comparison algorithm. We use subword parallelism to speedup sequence to sequence comparison using Streaming SIMD Extensions (SSE) on Intel Pentium processors. We compare two approaches, one requiring explicit data dependency handling and the other built to automatically handle dependencies. We achieve a speedup of 10-30 and establish the optimum conditions for each approach. We then implement a scalable and fault-tolerant distributed version of the genome comparison process on a network of workstations based on a static work allocation algorithm. We achieve speeds upwards of 8000 MCUPS on 64 workstations, one of the fastest implementations of the Smith-Waterman algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying SIMD Approach to Whole Genome Comparison on Commodity Hardware

Whole genome comparison compares (aligns) two genome sequences assuming that analogous characteristics may be found. In this paper, we present an SIMD version of the Smith-Waterman algorithm utilizing Streaming SIMD Extensions (SSE), running on Intel Pentium processors. We compare two approaches, one requiring explicit data dependency handling and one built to automatically handle dependencies ...

متن کامل

Parallel Programming Models and Paradigms

In the 1980s it was believed computer performance was best improved by creating faster and more e cient processors. This idea was challenged by parallel processing, which in essence means linking together two or more computers to jointly solve a computational problem. Since the early 1990s there has been an increasing trend to move away from expensive and specialized proprietary parallel superc...

متن کامل

XML Opportunities in Real Time Immersive Simulation & Visualization Based on Clusters of Commodity Computers

Real Time Immersive Simulation and Visualization applications have been powered traditionally by high-end graphics workstations or supercomputers. But recently, clusters of commodity computers (PCs, Macintoshes, low cost workstations) have become a practical alternative. The advantages of a commodity cluster include low cost, flexibility, performance scalability and use of to legacy systems. Th...

متن کامل

High Performance Protocols for Clusters of Commodity Workstations

Over the last few years technological advances in microprocessor and network technology have improved dramatically the performance achieved in clusters of commodity workstations. Despite those impressive improvements the cost of communication processing is still high. Traditional layered structured network protocols fail to achieve high throughputs because they access data several times. Networ...

متن کامل

Computer-aided management of commodity parts-based supercomputers

Supercomputers are used to solve big problems – they are «nutcrackers» that support scientists, researchers and developers in decoding the human genome, simulating the weather and climate, creating virtual wind tunnels for planes and cars, and designing effective medicaments – the so-called «grand challenge problems». Smaller supercom-puters are used for tasks that require more performance than...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003